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Q \ Abstract 

The mixing time of an ergodic, reversible Markov chain can be bounded in 
terms of the eigenvalues of the chain: specifically, the second-largest eigenvalue 
and the smallest eigenvalue. It has become standard to focus only on the second- 
largest eigenvalue, by making the Markov chain "lazy" . (A lazy chain does nothing 
at each step with probability at least g> and has only nonnegative eigenvalues.) 

CO ! An alternative approach to bounding the smallest eigenvalue was given by 

Diaconis and Stroock [H Proposition 2] and Diaconis and Saloff-Coste [U p. 702]. 
We give examples to show that using this approach it can be quite easy to obtain a 

^sO , bound on the smallest eigenvalue of a combinatorial Markov chain which is several 

orders of magnitude below the best-known bound on the second-largest eigenvalue. 

1 Introduction 

Let Ai be an ergodic, reversible Markov chain with finite state space Q and transition 
matrix P. It is well known that the eigenvalues of A4 satisfy 

1 = A > Ai > A 2 > • ■ ■ > Aiv_i > -1, 

where N = \Q\. We refer to Atv_i as the smallest eigenvalue of M.. 

The connection between the mixing time of a Markov chain and its eigenvalues is 
well-known (see [H| Proposition 1]): 

T{e)<{\-\*Y x hx^— (1) 

£ TTmin 

where r{e) denotes the mixing time of the Markov chain, 7r min = min xG Q7r(a;) and 

A* = max{Ai, |Ajv_i|}. 



When studying the mixing time of a Markov chain M. using ((T]), the approach which 
has become standard is to make the chain M. lazy by replacing P by (I + P)/2, where 
/ denotes the identity matrix. Then all eigenvalues of the lazy chain are nonnegative, 
and only the second-largest eigenvalue must be investigated. 

A lazy chain can be implemented so that its expected running time is the same as 
the mixing time of the original chain. So the problem with lazy chains is not their 
efficiency In our opinion, the main problem with lazy Markov chains is conceptual: in 
order to prove that a Markov chain is fast, we first slow it down. The device of using 
lazy Markov chains has been called "crude" [TSJ p. 110] and "unnatural" [TOl Chapter 
5]. 

In this note, we aim to advertise an approach for bounding the smallest eigenvalue of 
a Markov chain. This approach was first proposed by Diaconis and Stroock in 1991 [3 
Proposition 2], and a modified version was presented by Diaconis and Saloff-Coste two 
years later [U p. 702] (restated as Lemma [TTT1 below). The method of [I] has been applied 
in [HEIE], but in the theoretical computer science community it has become common 
to work with lazy chains. We urge researchers to first try the approach of jU [5] before 
choosing to work with a lazy version of their chain. 

Finally we remark that in [8] the author wrongly claimed that their P, Lemma 1.3] 
was new, when in fact it is precisely the result of [H p. 702]. We sincerely apologise for 
this error. 

1.1 The method 



See [XOJ for Markov chain definitions not given here. Write Q for the underlying directed 
graph of the Markov chain Ai, where Q = (Q, T) and each directed edge e G T corre- 
sponds to a transition of M.. If P(x, x) > then the edge xx is called a self-loop at x. 
Define Q(e) = Q(x, y) = n(x)P(x, y) for the edge e = xy. A walk in Q is a sequence of 
states x Xi ■ ■ ■ xi such that P(xj, Xj + i) > for j — 0, . . . , £ — 1. The walk is closed if 
xt = xq. If a walk has odd length then we call it an odd walk. 

For each x G Q let w x be an odd walk from x to x in Q. (Such a walk exists for each 
x, since the Markov chain is aperiodic.) Define W = {w x : x G Q}, a set of "canonical 
closed odd walks". For each transition e G T and each w G W, let r(e, w) denote the 
number of times that e appears as a directed edge of w. We can assume that r(e, w) < 2 
for all transitions e (indeed, if e is a self-loop then we can assume that r(e, w) < 1.) 
The congestion of W, denoted by 1](W), is defined by 



T](W) = max Q(e)- 1 Yl 
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Lemma 1.1. [U p. 702] Suppose that Ad is a reversible, ergodic Markov chain with state 
space Q, and let W be a set of odd walks defined as above. Then 



If \w x \ = 1 for all x G Q then the bound of Lemma [1.11 simplifies further to 

(1 + Aiv-ir 1 < \ max, eQ P(x, x)~\ (2) 

Remark 1.2. Suppose that the graph underlying a Markov chain Ai can be obtained 
from a connected bipartite graph by adding loops to an exponentially small proportion 
of states. For example, many instances of the knapsack chain [13] satisfy this property. 
Since every closed odd walk must traverse at least one of these self-loop edges, it is very 
difficult to define a set of canonical closed odd walks with low congestion. So Lemma [1J] 
is unlikely to be easy to apply in this case. 

2 Applications of the method 

We illustrate the use of Lemma [TTTI by applying it to three combinatorial Markov chains. 
Our applications are all ergodic and reversible with uniform stationary distribution, and 
no edge will be used more than once in any walk w x that we define. In this case the 
congestion can be simplified to 

V (W) = max eer P(e)- 1 ^ \w x \, (3) 

where P(e) = P(x,y) = P(y,x) for the transition e = xy. 

2.1 The switch chain for sampling regular graphs 

Our first application is to the Markov chain for sampling regular graphs known as the 
switch chain. A transition of the chain is performed as follows: from the current state 
G (a d- regular graph on vertex set [n]) choose an unordered pair of non- incident edges 
uniformly at random, let G' be the multigraph obtained from G by deleting these edges 
and inserting a perfect matching of their four endvertices, selected uniformly at random. 
If G' has no repeated edges then the new state is G', otherwise it is G. 

The lazy version of this chain was analysed by Cooper et al. [DI2]- Clearly P(G, G) > 
| for every state G of this chain, so by (J21) we immediately conclude that 

(IH-Aat-i)- 1 ^!- 

This is several orders of magnitude smaller than the best-known bound on (1 — Ai) -1 , 
which is 0(d 23 n 8 ) (see |2]). 

2.2 Jerrum and Sinclair's matchings chain 

The next application is to the well-known Markov chain for sampling perfect and near- 
perfect matchings of a fixed graph G. A transition of the chain is performed as follows: 
from the current state M (which is a perfect or near-perfect matching of G), choose 
an edge e G E(G) uniformly at random. If M is a perfect matching and e G M then 



the new state is M — {e}. If M is a near-perfect matching and both endvertices of e 
are unmatched in M then the new state is M U {e}. If M is a near-perfect matching, 
and exactly one endvertex of e is unmatched in M then let e' be the edge of M which 
matches the other endvertex of e: the new state is (M — {e'}) U {e}. In all other cases 
the new state is M. 

The lazy version of this chain was analysed by Jerrum and Sinclair [TTj [12], If G 
itself is not a perfect matching then P(M,M) > 1/\E\ for all states M of the chain 
(that is, for all perfect or near-perfect matchings M of G). Therefore © implies that 

(l + Aiv-i)- 1 ^^. 

This bound is at least a factor n 2 smaller than the smallest-known bound on (1 — Ai) , 
which is 0(n\E\q(n)) for graphs G for which the ratio between the number of near- 
perfect and perfect matchings is q{n) (see [12"]). 

2.3 A heat-bath chain for sampling contingency tables 



L) 



Our final application involves contingency tables. Let r = (ri, . . . , r m ) and c = (c\, 
be two vectors of positive integers with the same sum. A contingency table with row 
sums r and column sums c is an m x n matrix X = (xjj) with nonnegative integer 
entries, such that Y^j=i x hj = r^ for i = 1, . . . , m and YlT=i x hj = Cj for j = 1, . . . , n. 
Let f2 rc denote the set of all contingency tables with row sums r and column sums c. 
To avoid trivialities we assume throughout this section that min{m, n} > 2. 

Dyer and Greenhill [6] proposed a Markov chain for sampling contingency tables, 
which we will call the contingency chain. A transition of the chain is performed as 
follows: choose a 2 x 2 subsquare of the current table uniformly at random, then replace 
this 2x2 subsquare by a uniformly chosen 2x2 nonnegative integer matrix with the 
same row and column sums. 

The lazy contingency chain does nothing at each step with probability |, and oth- 
erwise performs a transition as described above. Cryan et al. [3] analysed the lazy 
contingency chain for a constant number of rows. They proved that (1 — Ai) -1 < n^ m ^ 
for m-rowed contingency tables with n columns, where m is constant and f(m) is an 
expression satisfying f(m) > 68m 4 . We now analyse the smallest eigenvalue of the 
(non-lazy) contingency chain. 

There is always a positive probability that the next state X' of the contingency chain 
is equal to the current state X, since the heat-bath step may simply replace the chosen 
2x2 subsquare with its current contents. However, the minimum of P(X,X) over all 
states X depends on r and c. (To see this, consider 2x2 squares.) We prefer a bound 
which depends only on m and d, and so we do not simply apply (J2J). 

Lemma 2.1. Let r = (r±, . . . , r m ) and c = (ci, . . . , c n ) be vectors of positive integers 
with a common sum which satisfy 

fi > r 2 > • • • > r m and C\ > C2 > • • • > c n . 



Suppose that min{ri, Ci} > 2 and max{m, n} > 3. The smallest eigenvalue of the 
contingency chain on Q rc satisfies 

(l + A^x)- 1 < 45 m 3 n 3 . 

Proof. Write [a] = {1, 2, . . . , a} for a G Z + . From X = (xij) G f2 rc , first suppose that 
there exists a 5-tuple (ii, «2, *3, Ji, J2) such that 

• Zi, 22, *3 a re distinct elements of [m], 

• Ji,j2 are distinct elements of [n], 

• x^, j-, , ^io 715 ^23 ?2 are an positive. 

Then (? 1? «2,*3, Ji, J2) is called row-good for X, and X is called row-good. If X is row- 
good, fix the lexicographically least 5-tuple (ii, ^2, 23, ji, J2) which is row-good for X and 
consider the following sequence of three transitions on the 3x2 subsquare defined by 
rows ii,i2,i3 and columns ji,J2' 

'yi,i yi,2\ /j/1,1 - 1 2/1,2 + 1\ / 3/1,1 1/1,2 \ /s/1,1 2/i,2 x 

2/2,1 2/2,2 =► 2/2,1 2/2,2 => 2/2,1 - 1 2/2,2 + 1 I =>- 1 2/2,1 2/2,2 

v 2/3,l 2/3,2/ \2/3,l + 1 2/3,2 - 1/ \2/3,l + 1 2/3,2 ~ 1/ \2/3,l 2/3,2, 

(For notational convenience we have written y^ for Xj feJ< in the above.) Note that all 
intermediate matrices are nonnegative, due to the row-good property. This defines a 
walk wx of length 3 from X to X in the graph underlying the contingency chain. 

We can define 5-tuples (ii,i2,ji,J2,J3) which are column- good for X in the analogous 
way, and say that X is column-good if there is a 5-tuple which is column-good for X. If 
X is column-good then taking the transpose of each matrix in the sequence of transitions 
above defines an odd walk wx of length 3 from X to X. 

Finally, suppose that X G f2 rjC is not row-good and is not column-good. Such an 
X is said to be bad. Then no row or column of X contains more than one positive 
entry. Since all row and column sums are positive, it follows that m = n > 3 and that 
every row and column contains exactly one positive entry. Let (^1,^2,^3, ji, J2, J3) be the 
lexicographically-least 6-tuple such that 

• i\^%2iH are distinct elements of [m], 

• ji,J2,J3 are distinct elements of [n], 

• x h,h > 2, while Xi 2 j 2 and x i3 j 3 are positive. 

(The conditions on r and c guarantee that such a 6-tuple exists.) Consider the following 
sequence of 5 transitions, performed on the 3x3 subsquare defined by rows ii,i2, 13 and 



lumi 


IS jl,J2,J3 


1/1,1 






o s 

1/2,2 
2/3,3, 





1 

1/2,2 - 1 





1/2,2 
2/3,3, 

This defines a walk % of length 5 from X to X in the graph underlying the chain. 

Now we must analyse the set W = {wx '■ X G fi r ,c} of odd walks defined above. 
Let e = (Z, Z') be a transition of the contingency chain. Then Z and Z' only differ in 
a 2 x 2 subsquare defined by rows i,i' and columns j,f. 

First we seek row-good X with e G wx- Let i" ^ {i, i'} be another row index, and fix 
one of the 6 ways to arrange (i,i',i",j,f) as (ii,i2,h,ji,32)- This gives enough infor- 
mation to uniquely identify a potential candidate for X. For example, if the transition 
e involves rows i\ and is then X — Z, while if the transition e involves rows i 2 and Z3 
then X = Z' . If e involves rows %\ and 22 then e is the second transition in the sequence, 
and X can be obtained from Z by reversing the first transition in the sequence: namely, 
adding 1 to entries (ix,ji) and (i 3 , j 2 ) and subtracting 1 from entries (ii, j 2 ) and (z 3 , ji). 
If X is a valid contingency table then (i 1; z 2 , ^3, ji, J2) is row-good for X. If it is the lex- 
icographically least such 5-tuple for X then e G wx- This identifies at most 12 (m — 2) 
tables X such that e G %. (This is an overcount, but good enough for our purposes.) 

By choosing a third column index j" g" {j,j'}, an analogous argument shows that 
there are at most 12(n — 2) column-good tables X with e G wj. 

Finally, we seek bad tables X such that e G %. Choose a row index z" G" {i, i'} and a 
column index j" G" {j, j'}, and fix one of the at most 36 ways to arrange (i, i', i",j,j',j") 
as (h,i2,h,ji, 32,jz)- Now each transition in the sequence alters a different 2x2 sub- 
square except the first and fourth, which both alter rows i\ 1 12 and columns ji, j2- Hence, 
arguing as above, there are at most two choices for X, for each fixed 6-tuple. This gives 
at most 72(m — 2)(n — 2) bad tables X such that e G wx- 

Combining all this, we find that the congestion parameter 77 (W) satisfies 

\ ( 1 (36(m - 2) + 36(n - 2) + 360(m - 2)(n - 2)) < 90 m 3 n 3 , 

and applying Lemma 11.11 completes the proof. □ 

Again we observe that this bound on (1 + Aa^i) -1 is several orders of magnitude 
lower than the best-known bound on the second-largest eigenvalue [3J. 

Remark 2.2. It has recently been shown J3j that the contingency chain described above 
has no negative eigenvalues. We include Lemma \2J\ here to illustrate an application of 
Lemma [7771 involving walks of length greater than one. 
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